Skip to content

Comments

HDFS-16631. Enable dfs.datanode.lockmanager.trace In Test.#4438

Closed
slfan1989 wants to merge 6 commits intoapache:trunkfrom
slfan1989:HDFS-16631
Closed

HDFS-16631. Enable dfs.datanode.lockmanager.trace In Test.#4438
slfan1989 wants to merge 6 commits intoapache:trunkfrom
slfan1989:HDFS-16631

Conversation

@slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Jun 14, 2022

JIRA:HDFS-16631. Enable dfs.datanode.lockmanager.trace In Test.

In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of deadlock, this is a very meaningful discussion, I was reading the log and found the following:

PR:#4367

2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) -
 not open lock leak check func.

Looking at the code, I found that there is such a parameter:

<property>
    <name>dfs.datanode.lockmanager.trace</name>
    <value>false</value>
    <description>
      If this is true, after shut down datanode lock Manager will print all leak
      thread that not release by lock Manager. Only used for test or trace dead lock
      problem. In produce default set false, because it's have little performance loss.
    </description>
  </property> 

I think this parameter should be added in the test environment, so that if there is a DN deadlock, the cause can be quickly located.

If my understanding is correct, if a thread needs both read locks and write locks, if this parameter is true, relevant thread information can be printed.

@slfan1989
Copy link
Contributor Author

@Hexiaoqiao @MingXiangLi @ZanderXu please help me review the code.

@MingXiangLi
Copy link
Contributor

DataSetLockManager only print lock trace when invoke DataNode.shutdown() or dataSetLockManager.lockLeakCheck().So I doubt it will work in all UT.

@MingXiangLi
Copy link
Contributor

Ans should we throw Exception ? Most of user will ignore if just print the log.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Jun 14, 2022

DataSetLockManager only print lock trace when invoke DataNode.shutdown() or dataSetLockManager.lockLeakCheck().So I doubt it will work in all UT.

Thanks for the suggestion, I think this parameter can cover scenarios like HDFS-16600, DN shutdown will definitely be called in Junit Test, I personally feel that if there is an error in the junit test of DN and if we suspect a deadlock, we can see this print message.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Jun 14, 2022

Ans should we throw Exception ? Most of user will ignore if just print the log.

If a thread obtains a read lock (write lock) and needs to obtain a write lock (read lock), will an exception be thrown directly?
My understanding is that once a deadlock occurs, the code will definitely report an error, and there is no need to throw an exception directly. I personally feel that printing is enough. of course, we can also throw exceptions directly.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 55s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 16s trunk passed
+1 💚 shadedclient 61m 55s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 shadedclient 22m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 28s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
91m 20s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/1/artifact/out/Dockerfile
GITHUB PR #4438
Optional Tests dupname asflicense unit codespell detsecrets xmllint
uname Linux cd4c75a4e6c9 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a9933a1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/1/testReport/
Max. process+thread count 520 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/1/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@ZanderXu
Copy link
Contributor

@slfan1989 It's a good idea. But I personally feel that it would be nice to thrown exception directly in the test environment when there is a lock leak.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Jun 15, 2022

@MingXiangLi @ZanderXu @Hexiaoqiao Thanks for helping to review the code, can I make the following changes?

public void lockLeakCheck() throws Exception {
    if (!openLockTrace) {
      LOG.warn("not open lock leak check func");
      return;
    }
    if (threadCountMap.isEmpty()) {
      LOG.warn("all lock has release");
      return;
    }
    setLastException(new Exception("lock Leak"));
    threadCountMap.forEach((name, trackLog) -> trackLog.showLockMessage());
    // throw exception ?
    throw new Exception("lock Leak");
  }

@Hexiaoqiao
Copy link
Contributor

From my side, I do not think enable lock trace only is good idea for tests as @MingXiangLi has mentioned above. The only INFO level log will not help to debug or test. IMO, if there are some cases we would like to cover and need to collect locks information, it is better to add some inject logic. FYI.

@slfan1989
Copy link
Contributor Author

From my side, I do not think enable lock trace only is good idea for tests as @MingXiangLi has mentioned above. The only INFO level log will not help to debug or test. IMO, if there are some cases we would like to cover and need to collect locks information, it is better to add some inject logic. FYI.

Thank you very much for your suggestion, I will think about how to collect lock information!

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 27s trunk passed
+1 💚 shadedclient 62m 57s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 26s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 shadedclient 21m 45s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 26s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
92m 8s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/2/artifact/out/Dockerfile
GITHUB PR #4438
Optional Tests dupname asflicense unit codespell detsecrets xmllint
uname Linux e09f249c6203 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 438879f
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/2/testReport/
Max. process+thread count 522 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/2/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 53s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 28s trunk passed
+1 💚 shadedclient 61m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 shadedclient 21m 58s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 27s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 45s The patch does not generate ASF License warnings.
91m 14s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/3/artifact/out/Dockerfile
GITHUB PR #4438
Optional Tests dupname asflicense unit codespell detsecrets xmllint
uname Linux 2672a54c0019 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 30bf6ed
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/3/testReport/
Max. process+thread count 550 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/3/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@slfan1989
Copy link
Contributor Author

readLock

getVolume(final ExtendedBlock b)
getStoredBlock(String bpid, long blkid)
Set<? extends Replica> deepCopyReplica(String bpid)
getBlockInputStream(ExtendedBlock b, long seekOffset)
moveBlockAcrossStorage(ExtendedBlock block, StorageType targetStorageType, String targetStorageId)
moveBlockAcrossVolumes(ExtendedBlock block, FsVolumeSpi destination)
ReplicaHandler createRbw(StorageType storageType, String storageId, ExtendedBlock b, boolean allowLazyPersist)
Map<DatanodeStorage, BlockListAsLongs> getBlockReports(String bpid)
public List<ReplicaInfo> getFinalizedBlocks(String bpid)
public boolean contains(final ExtendedBlock block)
public String getReplicaString(String bpid, long blockId)
public long getReplicaVisibleLength(final ExtendedBlock block)
public BlockLocalPathInfo getBlockLocalPathInfo(ExtendedBlock block)

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 47m 25s trunk passed
+1 💚 shadedclient 69m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 23s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 shadedclient 22m 31s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 1m 26s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
99m 33s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/4/artifact/out/Dockerfile
GITHUB PR #4438
Optional Tests dupname asflicense unit codespell detsecrets xmllint
uname Linux f79c1a23757e 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5dac9ee
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/4/testReport/
Max. process+thread count 524 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4438/4/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@slfan1989 slfan1989 closed this Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants